Goto

Collaborating Authors

 green cone


A performance contextualization approach to validating camera models for robot simulation

arXiv.org Artificial Intelligence

The focus of this contribution is on camera simulation as it comes into play in simulating autonomous robots for their virtual prototyping. We propose a camera model validation methodology based on the performance of a perception algorithm and the context in which the performance is measured. This approach is different than traditional validation of synthetic images, which is often done at a pixel or feature level, and tends to require matching pairs of synthetic and real images. Due to the high cost and constraints of acquiring paired images, the proposed approach is based on datasets that are not necessarily paired. Within a real and a simulated dataset, A and B, respectively, we find subsets Ac and Bc of similar content and judge, statistically, the perception algorithm's response to these similar subsets. This validation approach obtains a statistical measure of performance similarity, as well as a measure of similarity between the content of A and B. The methodology is demonstrated using images generated with Chrono::Sensor and a scaled autonomous vehicle, using an object detector as the perception algorithm. The results demonstrate the ability to quantify (i) differences between simulated and real data; (ii) the propensity of training methods to mitigate the sim-to-real gap; and (iii) the context overlap between two datasets.


Grounded Semantic Composition for Visual Scenes

arXiv.org Artificial Intelligence

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for a large percentage of test cases. In an analysis of the system's successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account.


Grounded Semantic Composition for Visual Scenes

Journal of Artificial Intelligence Research

We present a visually-grounded language understanding model based on a study of how people verbally describe objects in scenes. The emphasis of the model is on the combination of individual word meanings to produce meanings for complex referring expressions. The model has been implemented, and it is able to understand a broad range of spatial referring expressions. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to natural language expressions for a large percentage of test cases. In an analysis of the system's successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account.